Criteria for Evaluating Information Extraction Systems

نویسندگان

  • Chia-Hui Chang
  • Mohammed Kayed
  • Moheb Ramzy Girgis
  • Khaled Shaalan
چکیده

The Internet presents a huge amount of useful information which is usually formatted for its users, which makes it difficult to extract relevant data from various sources. Therefore, the availability of robust, flexible Information Extraction (IE) systems that transform the Web pages into program-friendly structures will become a great necessity. Although many approaches for data extraction from Web pages have been developed, there has been limited effort to compare such tools. In addition to briefly surveying the major data extraction approaches described in the literature, the paper also mainly presenting three classes of criteria for qualitatively analyzing these approaches. The criteria of the first class are concerned with the difficulties of an IE task, so these criteria are capable of determining why an IE system fails to handle some Web sites of particular structures. The criteria of the second class are concerned with the effort made by the user in the training process, so these criteria are capable of measuring the degree of automation for IE systems. The criteria of the third class are concerned with the techniques used in IE tasks, so these criteria are capable of measuring the performance of IE systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

امنیت اطلاعات سامانه های تحت وب نهاد کتابخانه های عمومی کشور

Purpose: This paper aims to evaluate the security of web-based information systems of Iran Public Libraries Foundation (IPLF). Methodology: Survey method was used as a method for implementation. The tool for data collection was a questionnaire, based on the standard ISO/IEC 27002, that has the eleven indicators and 79 sub-criteria, which examines security of web-based information systems of IP...

متن کامل

Determining the factors affecting the evaluation of business intelligence systems with an emphasis on the integrity of Organizational resources

In the information age, the speed of producing and supplying valuable information is considered as one of the keys to success in the organizations and institutes.The major objective of this study was to investigate and specify the effective factors in evaluating the BI systems with an emphasis on the integrity of organizational resources. First, five factors were determined as the major factors...

متن کامل

An integrated model of fuzzy multi-criteria decision making and stochastic programming for the evaluating and ranking of advanced manufacturing technologies

Investment appraisal in advanced manufacturing technologies (AMTs) has been receiving considerable attention over the past three decades. As stated in numerous studies, traditional engineering economic methods cannot adequately justify investments in AMTs. Thus, beside these methods, some other solutions have been proposed in this field. The methods applied in the evaluation of AMTs can be clas...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

ارائه یک مدل احتمالاتی جهت تعیین انسجام متن در سیستم های پرسش و پاسخ تعاملی

Evaluation plays an important role in interactive question answering systems like many computational linguistics fields. The coherence between the questions and the answers exchanged between the user and the system is one of the important criteria in evaluating these systems. In this paper, a new approach to determine the degree of coherence of generated text by the IQA systems is presented. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006